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We describe three case studies illustrating the use of ACL2s to prove the correctness of optimized 
reactive systems using skipping refinement. Reasoning about reactive systems using refinement in¬ 
volves defining an abstract, high-level specification system and a concrete, low-level implementation 
system. Next, one shows that the behaviors of the implementation system are allowed by the specifi¬ 
cation system. Skipping refinement allows us to reason about implementation systems that can “skip” 
specification states due to optimizations that allow the implementation system to take several specifi¬ 
cation steps at once. Skipping refinement also allows implementation systems to stutter, i.e., to take 
several steps before completing a specification step. We show how ACL2s can be used to prove skip¬ 
ping rehnement theorems by modeling and proving the correctness of three systems: a JVM-inspired 
stack machine, a simple memory controller, and a scalar to vector compiler transformation. 


1 Introduction 

Refinement is a powerful method for reasoning about reactive systems. The idea is that a simple high- 
level abstract system acts as a specification for a low-level implementation of a concrete system. The 
goal is then to prove that all observable behaviors of the concrete system are behaviors of the abstract 
system. It is often the case that the concrete system requires several steps to match one high-level step 
of the abstract system, a phenomenon commonly known as stuttering. Therefore, notions of refinement 
usually directly account for stuttering |l2l|5l|8l. However, in the course of engineering an efficient imple¬ 
mentation, it is often the case that a single step of the concrete system can correspond to several steps 
of the abstract system, a phenomenon that is dual of stuttering. For example, in order to reduce memory 
latency and effectively utilize memory bandwidth, memory controllers often buffer requests to memory. 
The pending requests in the buffer are analyzed for address locality and then at some time in the future, 
multiple locations in the memory are read and updated simultaneously. Similarly, to improve instruction 
throughput, superscalar processors fetch multiple instructions in a single clock cycle. These instructions 
are analyzed for instruction-level parallelism (e.g., the absence of data dependencies), and where possible 
multiple instructions are executed in parallel, retired in a single clock cycle. In both the above examples, 
updating multiple locations in memory and retiring multiple instructions in a single clock cycle, results 
in scenario where a single step in the optimized implementation may correspond to multiple steps in the 
abstract system. A notion of refinement that only account for stuttering is therefore not appropriate for 
reasoning about such optimized systems. 

In our companion paper lITOl . we proposed skipping refinement, a new notion of correctness for rea¬ 
soning about optimized reactive systems and a proof method that is amenable for mechanical reasoning. 
The applicability of skipping refinement was shown using three case studies: a JVM-inspired stack ma¬ 
chine, an optimized memory controller, and a vectorizing compiler transformation. In ifTOl we focused 
on finite-state models for the systems in the first two case studies and used model-checkers to verify 

*This research was supported in part by DARPA under AFRL Cooperative Agreement No. FA8750-10-2-0233, by NSF 
grants CCF-1117184 and CCF-1319580, and by OSD under contract FA8750-14-C-0024. 


M. Kaufmann and D. Rager (Eds.): ACL2 Workshop 2015 (ACL2 2015). 
EPTCS 192, 2015, pp. Ill-fl2^ doi: 10.4204/EPTCS. 192.9 


© Mitesh Jain & Panagiotis Manolios 
This work is licensed under the 
Creative Commons Attribution License. 





112 


Proving Skipping Refinement with ACL2s 


skipping refinement. In this paper, we consider their corresponding infinite-state models and prove their 
correctness in ACL2s, an interactive theorem prover Q. We also discuss in detail the modeling of vec¬ 
torizing compiler transformation and its proof of correctness. In Section]^ we motivate the need for a 
new notion of refinement with an example. In Section we define well-founded skipping simulation. 
In Section 1^ we discuss the three case studies. We end the paper with conclusion and future work in 
Section |5] 

2 Motivating Example 

To illustrate the notion of skipping simulation, we consider an example of a discrete-time event simula¬ 
tion (DBS) system lITOl . An abstract high-level specification of DBS is described as follows. Bet E be 
set of events and V be set of variables. Then a state of abstract DBS is a three-tuple {t,Sch,A), where t 
is a natural number denoting current time; Sch is a set of pairs (e,4), where e is an event scheduled to 
be executed at time 4 > t; A is an assignment to variables in V. The transition relation for the abstract 
DBS system is defined as follows. If at time t there is no {e,t) G Sch, i.e., there is no event scheduled 
to be executed at time t, then t is incremented by 1. Blse, we (nondeterministically) choose and execute 
an event of the form {e,t) G Sch. The execution of event may result in modifying A and also adding 
finite number of new pairs {e',t') in Sch. We require that t' > t. Binally execution involves removing the 
executed event {e,t) from Sch. 

Now, consider an optimized, concrete implementation of the abstract DBS system. As before, a 
state of the concrete system is a three-tuple {t,Sch,A). However, unlike the abstract system which just 
increments time by 1 when no events are scheduled for the current time, the optimized system uses a 
priority queue to find the next event to execute. The transition relation is defined as follows. An event 
{e,le) with the minimum time is selected, t is updated to 4 and the event e is executed, as in the abstract 
DBS. 

Notice that when no events are scheduled for execution at the current time, the optimized implemen¬ 
tation of the discrete-time event simulation system can run faster than the abstract specification system 
by skipping over abstract states. This is not a stuttering step as it results in an observable change in the 
state of the concrete DBS system {t is update to 4). Also, it does not correspond to a single step of the 
specification. Therefore, it is not possible to prove that the implementation refines the specification using 
notions of refinement that only allow stuttering E [T3]|, because that just is not true. But, intuitively, 
there is a sense in which the optimized DBS system does refine fhe abstract DBS system. The notion 
of skipping refinement proposed in iTT^ is an appropriate notion to relate such systems: a low-level 
implementation that can run slower (stutter) or run faster (skip) than the high-level specification. 

3 Skipping Refinement 

In this section, we first present the notion of well-founded skipping simulation ifTOl . The notion is de¬ 
fined in the general setting of labeled transition systems (TS) where labeling is on states. [^We also place 
no restriction on the state space sizes and the branching factor, and both can be of arbitrary infinite car¬ 
dinalities. The generality of TS is useful to model systems that may exhibit unbounded nondeterminism, 
for example, modeling a program in a language with random assignment command x =?, which sets x to 
an arbitrary integer l!Ti|. 

* Note that labeled transition system are also used in the literature to refer to transition systems where transitions (edges) are 
labeled. However, we prefer to work with TS where states are labeled. 
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We first describe the notational conventions used in the paper. Function application is sometimes 
denoted by an infix dot and is left-associative. For a binary relation R, we often use the infix notation 
xRy instead of (x,y) G R. The composition of relation R with itself i times (for 0 < / < ft)) is denoted R‘ 
(ft) = N and is the first infinite ordinal). Given a relation R and I < k < (O, R^^ denotes 
R-^ denotes \Ja)>i>k^' ■ Instead of we often write the more common /?+. l±) denotes the disjoint 

union operator. Quantified expressions are written as {Qx: r: p), where Q is the quantifier {e.g., 3,V), x 
is the bound variable, r is an expression that denotes the range of x {true, if omitted), and p is the body 
of the quantifier. 

Definition 1 A labeled transition system (TS) is a structure (5, — )■ ,L), where S is a non-empty (possibly 
infinite) set of states, —)• fSxSisa left-total transition relation (every state has a successor), and L is 
a labeling function: its domain is S and it tells us what is observable at a state. 


Skipping refinement is defined based on well-founded skipping simulation, a notion that is amenable 
for mechanical reasoning. This notion allows us to reason about skipping refinement by checking mostly 
local properties, i.e., properties involving states and their successors. The intuition is, for any pair of 
states s,w, which are related and a state u such that s ^ u, there are four cases to consider (Definition 31: 

(a) either we can match the move from 5 to m right away, i.e., there is a v such that w —> v and u is related 
to V, or (b) there is stuttering on the left, or (c) there is stuttering on the right, or (d) there is skipping on 
the left. 


i. i. i. j. .|, ... *1' > 2 


(a) 


(b) (c) (d) 


Definition 2 (Well-founded Skipping) B Q S x S is a well-founded skipping relation on a transition 
system .M = {S,^,L) iff: 

(WFSKl) {ys,w€S: sBw: L.s = L.w) 

(WFSK2) There exist functions, rankt: S x S ^ W, rankl: S x S x S ^ (0, such that (W, A) is well- 
founded and 

lf/s,u,w G S : s ^ uAsBw : 

(a) (3v: w —)• V: uBv) V 

(b) (uBw Arankt{u,w) A rankt{s,w)) V 

(c) (3v: w—)-v: sBvArankl(v,s,u) < rankl{w,s,u)) V 

(d) (3v : w v: uBv)) 

In the above definition, conditions (WFSK2a) to (WFSK2c) require reasoning only about single step 
—)• of the transition system. But condition (WFSK2d) requires us to check that there exists a v such that v 
is reachable from w in two or more steps and uBv holds. Reasoning about reachability, in general, is not 
local. However, for the kinds of optimized systems we are interested in the number of abstract steps that a 
concrete step corresponds to is bounded by a constant—a bound determined early on in the design phase. 
For example, the maximum number of abstract steps that a concrete step of a superscalar processor can 
correspond to is the number of instruction that the designer decides to retire in a single cycle. This is 
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a constant that is decided early on in the design phase. Therefore, for such systems we can still reason 
using “local” methods. Furthermore, in the case this constant is a “small” number, condition (WFKS2d) 
can be checked by simply unrolling the transition relation of the concrete system, an observation that 
we exploit in our first two case studies. On the other hand, this simplification is not always possible. 
For example, in the optimized DES system describe above, notice that number of abstract steps that 
optimized DES can take corresponds to the difference between current time and earliest time an event is 
scheduled for execution. This difference can not be a priori bounded by a constant. 

We now define fhe notion of skipping refinemenf, a nofion fhaf relafes two fransifion sysfems: an 
abstract fransifion sysfem and a concrete fransifion sysfem. In order fo define skipping refinemenf, we 
make use of refinement maps, functions fhaf map sfafes of fhe concrefe sysfem fo sfafes of fhe absfracf 
sysfem. Informally, if fhe concrete system is a skipping refinemenf of fhe absfracf sysfem fhen ifs observ¬ 
able behaviors are also behavior of fhe absfracl system modulo skipping (which includes sfuffering). Eor 
example, in our running example of DES, if fhe refinemenf map is fhe identify funcfion fhen if is easy 
fo see fhaf any behavior of fhe optimized sysfem is a behavior of fhe absfracf sysfem modulo skipping. 
In pracfice, fhe absfracl system and fhe concrefe system are described al differenl levels of abslracfion. 
Refinemenf maps along wifh fhe labeling funcfion enable us fo define whal is observable al concrete 
sfafes from fhe view poinl of fhe absfracl sysfem. 

Definition 3 (Skipping Refinement) Let = (Sa, A,L^) and be transition sys¬ 

tems and let r: Sc ^ Sa be a refinement map. Wh say ^c skipping refinement of ^a with respect 
to r, written if there exists a relation B C 5c X Sa such that all of the following hold. 

1. Ifis G Sc ■■ sBr.s) and 

2. B is an WFSK on {Sc tt) tt) A, .if) where if .5 = La{s) for s G Sa, and if .5 = LA{r.s) for 
sGSc. 

Well-founded skipping gives us a simple proof rule to determine if a concrete transition system .y^c 
is a skipping refinement of an abstract transition system .J^a with respect to a refinement map r. Given a 
refinement map r : Sc ^ Sa and relation B <G Sc x Sa, "we check the following two conditions: (a) for all 
s G Sc, sBr.s and (b) if B is a WESK on the disjoint union of ./^c and ^^a- If (a) and (b) hold, y^c -^a- 
Eor a more detailed exposition of skipping refinement we refer the reader to our companion paper ifTOl . 

Notice that we place no restrictions on refinement maps. When refinement is used in specific contexts 
it is often useful to place restrictions on what a refinement map can do, e.g., we may require for every 
s G Sc that LA{r.s) is a projection of Lc{s). The generality of refinement map is useful in all three case 
studies considered in this paper, where a simple refinement map that is a projection function would not 
have sufficed. 

4 Case Studies 

We consider three case studies. The first case study is a hardware implementation of a JVM-inspired stack 
machine with an instruction buffer. The second case study is a memory controller with an optimization 
that eliminates redundant writes to memory. The third case study is a compiler transformation that 
vectorizes a list of scalar instructions. Eor each case study we model the abstract system and the concrete 
system in ACE2s. We define an appropriate refinement map and prove that the implementation refines 
the specification using well-founded skipping simulation. 

We first briefly list some conventions used to describe the syntax and the semantics of the systems. 
Adding element e to the beginning or end of a list (or an array) I is denoted by e:: I and I ::e, respectively. 
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Each transition consists of a state -.conditionj,... ,conditionn pair above a line, followed by the next state 
below the line. If a concrete state matches the state in a transition and satisfies each of the conditions, 
then the state can transition to the state below the line. We formalize the operational semantics of the 
machines by describing the effect of each instruction on the state of the machine. The proof scripts are 
publicly available |[I]. 

4.1 JVM-inspired Stack Machine 

In this case study, we verify a stack machine inspired by the Java Virtual Machine (JVM). Java proces¬ 
sors have been proposed as an alternative to just-in-time compilers to improve the performance of Java 
programs. Java processors such as JME |P| fetch bytecodes from an instruction memory and store them 
in an instruction buffer. The bytecodes in the buffer are analyzed to perform instruction-level optimiza¬ 
tions e.g., instruction folding. In this case study, we verify BSTK, a simple hardware implementation of 
part of the JVM. BSTK is an incomplete and inaccurate model of JVM that only models an instruction 
memory, an instruction buffer and a stack. Only a small subset of JVM instructions are supported (push, 
pop, top, nop). However, even such a simple model is sufficient to exhibit the applicability of skipping 
simulation and the limitations of current hardware model-checking tools. 

STK is the high-level specification with respect to which we verify the correctness of BSTK, the 
implementation. Their behaviors are defined using absfracf fransifion sysfems. The synfax and fhe oper¬ 


ational semantics are shown in Eig. 1 


The sfafe of STK consisfs of an insfrucfion memory imem; a program counfer pc; and a slack stk. 
An insfrucfion is one of push, pop, top, and nop. We use fhe listof combinalor in defdata lo 
encode fhe insfrucfion memory as lisl of insfruclions and slack as a lisl of elemenls l|6l. The program 
counfer is encoded as a nalural number using fhe primilive dala lype nat. We Ihen compound Ihese 
componenls fo encode slate of STK using fhe defdata record conslrucf. The defdata framework 
infroduces a conslrucfor function estate, a sel of accessor functions for each field (e.g., sstate-imem), 
a recognizer function sstatep identifying fhe sfafe, an enumerator nth-sstate and several useful 
Iheorems to reason aboul compositions of Ihese funcfions. 

(defdata el all) 

(defdata stack (listof el)) 

(defdata inst (oneof (list ’pop) (list ’top) 

(list ’nop) (list ’push el))) 

(defdata inst-mem (listof inst)) 


(defdata estate (record (imem . inst-mem) 

(pc . nat) 

(stk . stack))) 

STK felches an insfrucfion from fhe instruclion memory, executes if, increases fhe program counfer. 


and possibly modifies fhe slack, as ouflined in Eig. 1 


STK felches an insfrucfion from fhe imem, execufes if, incremenfs fhe pc by 1, and possibly modifies 


the stk, as outlined in Eig. 1 Since STK is a deterministic machine, we formalize its transition relation 


using a function spec-step, which uses an auxiliary function stk-step-inst to capture the effect of 
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stk := [] I ely.stk 

inst := (push e) \ (pop) \ (top) \ (nop) 
imem := [] | instv.imem 
pc '.= 0\ l\ ■■ ■ \ n\ ■ ■■ 
ibuf := [insti ,..., instk] 
sstate := (imem,pc,stk) 
istate := (imem,pc, ibuf, stk) 

STK (—>) where s = eapaeity of stk, t = \stk\ 

(imem,pc, stk) : imem\pc\ = (push v) ,t<s 
(imem,pc + i, v:: stk) 

(imem,pc,stk) : imem\pc] = (push v),t = s 
(imem,pc +1 ,stk) 

(imem,pc, []) : imem\pc\ = (pop) 

(imem,pc + i, []) 

(imem,pc,v::stk) : imem\pc] = (pop) 

{imem,pc +1 ,stk) 

(imem,pc,stk) : imem\pc] = (top) 

(imem,pc +1 ,stk) 

(imem,pc,stk) : imem\pc] = (nop) 

(imem,pc +1 ,stk) 

(imem,pc,stk) : imem\pc\ = nil 
(imem,pc +1 ,stk) 

Q 

BSTK (—>) where k = eapaeity of ibuf, m = \ibuf\ 

(imem,pc,ibuf,stk) : m <k, imem\pc] / (top) 

(imem,pc + 1, ibuf y. imem\pc\, stk) 

(imem,pc,ibuf,stk) : imem\pc] = (top), 

(ibuf ,0,stk) (ibuf ,m,stk') 

(imem,pc + i, [], stk') 

(imem,pc, ibuf, stk) : imem\pc] = nil, 

(ibuf ,0,stk) (ibuf ,m,stk') 

(imem,pc + i, [], stk') 

(imem,pc, ibuf, stk) : m = k, 

(ibuf ,0,stk) (ibuf ,m,stk') 

(imem,pc + 1, [imem\pc\\,stk') 

Figure 1: Syntax and Semantics of Stack and Buffered Stack Machine 


(Staek) 
(Instruetion) 
(Program) 
(Program Counter) 
(Instruetion Buffer) 
(STK State) 
(BSTK State) 
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executing an instruction on the stack. We are now ready to define the transition system of STK 
machine. The set of states Sa in the transition system ^a is is the set of all states satisfying the predicate 
sstatep. Two states s,u£Sa are related by transition relation —> iff it is possible in one step to transition 
from s to u, i.e., u = (spec-step s). The labeling function, La is the identity function. 

(defun stk-step-inst (inst stk) 

"returns next state of stk" 

(let ((op (car inst))) 

(cond ((equal op ’push) 

(mpush (cadr inst) stk )) 

((equal op ’pop) 

(mpop stk)) 

((equal op ’top) 

(mtop stk)) 

(t stk)))) 

(defun spec-step (s) 

"single step of STK machine" 

(let* ((pc (sstate-pc s)) 

(imem (sstate-imem s)) 

(inst (nth pc imem)) 

(stk (sstate-stk s))) 

(if (instp inst) 

(sstate imem (1+ pc) (stk-step-inst inst stk)) 

(sstate imem (1+ pc) stk)))) 

The state of BSTK is similar to STK, except that it also includes an instruction buffer ibuf. The 
instruction buffer is encoded as a list of instructions with an additional restriction on its capacity 
(ibuf-capacity). To encode ibuf in the defdata framework, we have at least two choices. We can 
use the oneof def data construct to encode it as an empty list or list of one, two, or three instructions. 
Another way is to use the capability of the def data framework to define custom data types. In the later 
case, we first define a recognizer function inst-buf f p and an enumerator function nth-inst-buf f. 
(defun inst-buffp (1) 

(and (inst-memp 1) 

(<= (len 1) (ibuf-capacity)))) 

(defun nth-inst-buff (n) 

(let ((imem (nth-inst-mem n))) 

(if (<= (len imem) (ibuf-capacity)) 
imem 

(let ((il (car imem)) 

(i2 (cadr imem)) 

(i3 (caddr imem))) 

(list il i2 i3))))) 

We can now register our custom type inst-buf f using the register-custom-type macro. Once we 
have registered it as a def data type we can use it just like other type directly introduced using def data 
construct. 
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(register-custom-type inst-buff :enumerator nth-inst-buff 

:predicate inst-buffp) 

We can now define state of BSTK machine using def data record construct. 

(defdata istate 

(record (imem . inst-mem) 

(pc . nat) 

(stk . stack) 

(ibuf . inst-buff))) 

BSTK fetches an instruction from the instruction memory, and if the instruction fetched is not top 
and the instruction buffer is not full (function stutterp below), it queues the fetched instruction to the 
end of the instruction buffer and increments the program counter. If the instruction buffer is full, then 
the machine executes all buffered instructions in the order they were enqueued, thereby draining the 
instruction buffer and obtaining a new stack. It also updates the instruction buffer so that it only contains 
just the current fetched instruction. If none of the transition rules match, then BSTK drains the instruction 
buffer (if it is not empty) and updates the stack accordingly. Since BSTK is also a deterministic machine, 

c 

we encode its transition relation (—>) as the function impl-step. Having defined the transition relation 
and the state of BSTK machine, we can define its transition system ^c- 

(defun stutterp (inst ibuf) 

"BSTK stutters if ibuf is not full or the current instruction is not ’top" 
(and (< (len ibuf) (ibuf-capacity)) 

(not (equal (car inst) ’top)))) 

(defun impl-step (s) 

"single step of BSTK" 

(let* ((stk (istate-stk s)) 

(ibuf (istate-ibuf s)) 

(imem (istate-imem s)) 

(pc (istate-pc s)) 

(inst (nth pc imem))) 

(if (instp inst) 

(let ((nxt-pc (1+ pc)) 

(nxt-stk (if (stutterp inst ibuf) 
stk 

(impl-observable-stk-step stk ibuf))) 

(nxt-ibuf (if (stutterp inst ibuf) 

(impl-internal-ibuf-step inst ibuf) 
(impl-observable-ibuf-step inst)))) 

(istate imem nxt-pc nxt-stk nxt-ibuf)) 

(let ((nxt-pc (1+ pc)) 

(nxt-stk (impl-observable-stk-step stk ibuf)) 

(nxt-ibuf nil)) 

(istate imem nxt-pc nxt-stk nxt-ibuf))))) 

Before we describe the correctness of BSTK based on skipping refinement, we first discuss why an 
existing notion of refinement such as stuttering refinement lIT^ will not suffice. If BSTK takes a step. 
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which requires it to drain its instruction buffer (the buffer is full or the current instruction fetched is 
top), then the stack will be updated to reflect the execution of all instructions in ibuf, something that is 
neither a stuttering step nor a single transition of the STK system. Therefore, it is not possible to prove 
that BSTK refines STK, using stuttering refinement and a refinement map that does not transform the 
stack. 

We now formulate the correctness of BSTK based on the notion of skipping refinement. We show 


<r using Definition 2 We define the refinement map, but first we note that we do not have 
to consider all syntactically well-formed STK states. We only have to consider states whose instruc¬ 
tion buffer is consistent with the contents of the instruction memory, so called good states lITSl . One 
way of defining a good state is as follows: state s is good iff pc > \ibuf\ and stepping BSTK from 
{imem,pc — \ ibuf\, [],stk) state for \ibuf\ steps yields state s, where \ibuf\ is number of instructions in the 
instruction buffer of state s. We define a predicate good-statep recognizing a good state and show that 
the set of good states is closed under the transition relation of BSTK. 


(defun commited-state (s) 

(let* ((stk (istate-stk s)) 

(imem (istate-imem s)) 

(ibuf (istate-ibuf s)) 

(pc (istate-pc s)) 

(cpc (nfix (- pc (len ibuf))))) 
(istate imem cpc stk nil))) 


(defun good-statep (s) 

"if state s is reachable from a commited-state in |ibuf| steps" 

(let ((pc (istate-pc s)) 

(ibuf (istate-ibuf s))) 

(and (istatep s) 

(>= pc (len ibuf)) 

(let* ((cms (commited-state s)) 

(s-cms (cond ((endp ibuf) 
cms) 

((endp (cdr ibuf)) 

(impl-St ep cms)) 

((endp (cddr ibuf)) 

(impl-step (impl-step cms))) 

((endp (cdddr ibuf)) 

(impl-step (impl-step (impl-step cms)))) 
(t cms)))) 

(equal s-cms s))))) 


(defthm good-state-inductive 
(implies (good-statep s) 

(good-statep (impl-step s)))) 

The refinement map ref-map, a function from a set of good states to set of abstract states (sstatep) 
is defined as follows. 
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(defun ref-map (s) 

(let* ((stk (istate-stk s)) 

(imem (istate-imem s)) 

(pc (istate-pc s)) 

(ibuf (istate-ibuf s)) 
(ibuflen (len ibuf)) 

(rpc (nfix (- pc ibuflen)))) 
(estate imem rpc stk))) 


Given ref-map, we define B to be the binary relation induced by it, i.e., sBw iff 5 is a good state and 
w = ref-map ( 5 ). 

Now observe that when the instruction is full or the current instruction is top, one step of BSTK 
corresponds to largest number of STK steps. In both cases, the BSTK machine executes all instructions 
in the instruction buffer and if the current instruction is top, it executes it as well. The condition WFSK2d 
in [Definition 2 that requires us to reason about reachability, hence can easily be reduced to bounded 
reachability. Hence, we set j = k + 2, where k is the capacity of the instruction buffer, and condition 
WFSK2d is (3v : w v : uBv). 

Since STK and BSTK are deterministic machines and STK does not stutter, we only need to define 
one rank function, a function from set of good states to non-negative integers. 


(defun rank (s) 

"rank of an istate s is capacity of ibuf - |ibuf|" 
(- (ibuf-capacity) (len (istate-ibuf s)))) 


With above observations we simplify WFSK2 (Definition 2 1 to following condition. 
For all s ,u such that s and u are good states and u = (ref-map s) 


(ref-map s) (ref-map u) V 

((ref-map u) = (ref-map s) A (rank u) < (rank s)) (1) 


Notice that since BSTK is deterministic, n is a function of s, so we can remove u from the above 
formula. Since /: + 2 is a constant, we can expand out —^<^+2 only —> instead. We formalize 

Equation 1 in ACL2s by first defining a function spec-step-skip-rel, which takes as input STK 
states V and w and returns true only if v is reachable from w in (ibuf-capacity) -i- 1 steps. 

(defthm bstk-skip-refines-stk 
(implies (and (good-statep s) 

(equal w (ref-map s)) 

(equal u (impl-step s)) 

(not (and (equal w (ref-map u)) 

(< (rank u) (rank s))))) 

(spec-step-skip-rel w (ref-map u)))) 

Once the definitions were in place, proving bstk-skip-ref ines-stk with ACL2s was straight¬ 
forward. Next, we evaluated how amenable is SKS for automated reasoning, i.e., using only symbolic 
simulation and no additional lemmas. We model BSTK with instruction buffer capacity of 2, 3, and 4. 
while no other restrictions were placed on the machines. In particular, the instruction memory (imem) 
and the stack (stk) component of the state for BSTK and STK machines are unbounded. The experi¬ 
ments were run on a 2.2 GHz Intel Core i7 with 16 GB of memory. For the BSTK with instruction buffer 
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capacity of 2 instructions, it took ~ 12 minutes to complete the proof and for a BSTK with instruction 
buffer capacity of 3 instructions, it took ~ 2 hours. For BSTK with instruction buffer capacity of 4 
instructions the proof did not finish in over 3 hours. 


4.2 Memory Controller 

A memory controller is an interface between a CPU and a memory, and synchronizes communication 
between them. Designers implement several optimizations in a memory controller to maximize available 
memory bandwidth utilization and reduce the latency of memory accesses, known bottlenecks in optimal 
performance of programs. In this case study, we consider OptMEMC, a simple model of such an opti¬ 
mized memory controller. In our simplified model, a CPU is modeled as a list of memory request (reqs) 
and memory as a list of natural numbers {mem). 

OptMEMC fetches a memory request from location pt in a queue of CPU requests, reqs. It enqueues 
the fetched request in the request buffer, rbuf and increments pt to point to the next CPU request in 
reqs. The capacity of rbuf is k, a fixed positive integer. If the fetched request is a read or the request 
buffer is full, then before enqueuing the request into rbuf, OptMEMC first analyzes the request buffer 
for consecutive write requests to the same address in the memory {mem). If such a pair of writes exists 
in the buffer, it marks the older write requests in the request buffer as redundant. Then it executes 
all the requests in the request buffer except the one that are marked redundant. Requests in the buffer 
are executed in the order they were enqueued. In addition to read and write commands, the memory 
controller periodically issues a refresh command to preserve data in memory. A refresh command reads 
all memory locations and immediately writes them back without modification. Refresh commands are 
required to periodically reinforce the charge in the capacitive storage cells in a DRAM. In effect, a 
refresh command leaves the data memory unchanged. We define the function mref resh and prove that 
the memory is same before and after execution of the refresh command. This is the only property of 
mref resh that we would require. 

(defthm mrefresh-mem-unchanged 
(equal (mrefresh mem) 
mem)) 


To reason about the correctness of OptMEMC using skipping refinement, we define a high-level 
abstract system, MEMC, that acts as the specification for OPTMEMC. It fetches a memory request from 
the CPU and immediately executes the request. The syntax and the semantics of MEMC and OPTMEMC 
are given in Eig. 2[ using the same conventions as described previously in the stack machine section. 

We now formulate the correctness of OptMEMC based on the notion of skipping refinement. Eet 
= {Sa,^,La) and = {Sc, be transition systems for MEMC and OptMEMC respectively. 

Eike in the previous case study, we encode the state of the machines using defdata and formalize the 
transition relation of OptMEMC and MEMC using a step function that describes the effect of each 
instruction on the state of the machine. The labeling function La and Lc are the identity functions. Given 

<,r ^A- As was the case 


a refinement map ref-map : Sq —>• Sa, we use Definition 2 to show that 
with the previous case study, OptMEMC and MEMC are deterministic machines and MEMC does not 
stutter. WESK2 ( Definition 2) can again be simplified to Eormula 1. 

Once the definitions of the transition systems for the two machines were in place, it was straight¬ 
forward to prove skipping refinement with ACE2s. Eike in the previous case study, we also prove the 
theorem using only symbolic execution and no additional lemmas, for configurations of OPTMEMC 
with buffer capacity of 2 and 3. Eor OptMEMC with buffer capacity of 2, the final theorem was proved 
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(Memory) 
(Request) 
(Request Loeation) 
(Requests) 
(Request Buffer) 
(MEMC State) 
(OptMEMC State) 


MEMC (4) 

{reqs,pt, mem), reqs\pt] = {write addr v) 

{reqs,pt + 1, mem\addr\ -i— v) 


{reqs,pt, mem), reqs\pt] = {read addr) 

{reqs,pt +1 ,mem) 


{reqs,pt,mem), reqs\pt\ = {refresh) 

{reqs,pt +1 ,mem) 

OptMEMC (-^) 

Eet \rbuf\ =j 

{reqs,pt, rbuf, mem), j < k, req / top 
{reqs,pt, rbuf :: reqs\pt\, mem) 


{reqs,pt,rbuf,mem), reqs\pt\ = {read addr), 

{rbuf ,0,mem) {rbuf ,j,mem') 

{reqs,pt, W,mem') 


{reqs,pt, rbuf, mem), j = k, 

{rbuf ,0,mem) {rbuf ,k, mem') 

{reqs,pt, rbuf:: reqs\pt], mem') 


Eigure 2: Syntax and Semantics of MEMC and OptMEMC 


mem := [] | v::mem 

req := {write addr v) \ {read addr) \ {refresh) 

pt := 0 I 1 I • • • \n \ • • • 

reqs := [] | reqwreqs 

rbuf:= [reqi,...,reqk\ 

sstate := {reqs^pt, mem) 

istate := {reqs,pt,rbuf,mem) 
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in ~ 2 minutes and with OptMEMC buffer capacity of 3, it took ~ 1 hour to prove the final theorem. The 
proof with buffer capacity of 4 instructions did not finish in over 3 hours. 


4.3 Superword Level Parallelism with SIMD instructions 


An effecfive way fo improve the performance of multimedia programs running on modern commodity 
architectures is to exploit Single-Instruction Multiple-Data (SIMD) instructions (e.g., the SSE/AVX in¬ 
structions in x86 microprocessors). Compilers analyze programs for superword level parallelism and 
when possible replace multiple scalar instructions with a compact SIMD instruction that concurrently 
operates on multiple data IfTTl . In this case study, we illustrate the applicability of skipping refinement 
to verify the correctness of such a compiler transformation. 

Eor the purpose of this case study, we make some simplifying assumptions: the state of the source and 
target programs (modeled as transition systems) is a three-tuple consisting of a sequence of instructions, a 
program counter and a store. We also assume that a SIMD instruction simultaneously operates on two sets 
of data operands and that the transformation analyzes the program at a basic block level. Therefore, we 
do not model any control flow instruction. Eig. 3| shows how two add and two multiply scalar instructions 
are transformed into corresponding SIMD instructions. Notice that the transformation does not reorder 
instructions in the source program. 


a 

= b 

-1- 

c 

a 


b 


C 

d 

= e 

-1- 

f ^ 

d 


e 

+SIMD 

f 

u 

= V 

X 

w 

u 


V 

XSIMD 

w 

X 

= y 

X 

z 

X 


y 

z 


Eigure 3: Superword Parallelism 


The syntax and operational semantics of the scalar and vector machines are given in Eig. 4 using 
the same conventions as described previously in the stack machine section. We denote that x,...,y are 
variables with values Vx,---,Vy in store by {(x, V;^),..., (y, Vy)} C store. We use \{sop Vy)]] to denote 
the result of a scalar operation sop and [[(vop (va Vb){vd Ve))]] to denote the result of a vector operation 
vop. Einally, we use store\x-=v^,...^y.=vy to denote that variables x,... ,y are updated (or added) to store 
with values Vx ,..., Vy. Notice that the language of a source program consists of scalar instructions while 
the language of the target program consists of both scalar and vector instructions. As in the previous two 
case studies, we model the transition relation of a program (both source and target program) by modeling 
the effect of an instruction on the state of machines. 

We use the translation validation approach to verify the correctness of the vectorizing compiler trans¬ 
formation 14], /.e., we prove the equivalence between a source program and the generated vector program. 
As in the previous two case studies, the notion of stuttering simulation is too strong to relate a scalar pro¬ 
gram and the vector program produced by the vectorizing compiler, no matter what refinement map we 
use. To see this, note that the vector machine might run exactly twice as fast as the scalar machine and 
during each step the scalar machine might be modifying the memory. Since both machines do not stutter, 
in order to use stuttering refinement, the length of the vector machine run has to be equal to the run of 
the scalar machine. 

Eet . 

chines, respectively corresponding to the source and target programs. The vector program is correct iff 
refines We show <r using Definifion 2 Defermining j, an upper-bound on skipping 


A C 

= {Sa,^,La) and be fransifion sysfems of fhe scalar and vector ma- 
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(Variables) 
(Scalar Ops) 
(Vector Ops) 
(Scalar Inst) 
(Vector Inst) 
(Scalar Program) 
(Vector Program) 
(Registers) 


Scalar Machine (—>) 

{sprg,pc,store), (y, Vj.)} store, 

sprg\pc\ = sop{z xy),Vz = l{sop v^ Vy)]] 

{sprg,pc + 1 , store\z.-v^) 

Vector Machine (—>) 

{vprg,pc,store), {{x,Vx),{y,Vy)} ^ store, 
sprg\pc] = sop{z xy),v^= l{sop v^ Vy)]] 

{vprg,pc + J,storel2:^yJ 


{vprg,pc,store), vprg\pc] = vop{c a b)(f d e), 

{{a,Va), {b,Vb),{d,Vd),{e,Ve)} C store, 

{Vc,Vf) = l{vop {Va Vb){vd Ve))]] 


{vprg,pc +1 ,store\c..=v,p.=Vf) 


Figure 4: Syntax and Semantics of Scalar and Vector Program 


loc := {x,y,z,a,b,c,...} 

sop := add \ sub \ mul \ and \ or \ nop 

vop : = vadd \ vsub \ vmul \ vand \ vor \ vnop 

sinst := sop{zxy) 

vinst : = vop{c a b) (f d e) 

sprg := [] I sinst ::sprg 

vprg := [] I {sinst \ vinst) ::vprg 

store := [] I {x,Vx) '.'.store 


that reduces condition WFSK2d in Definition 2 to bounded reachability is simple because the vector 
machine can perform at most 2 steps of the scalar machine at once; therefore j = 3 suffices. 

We next dehne the rehnement map. Recall that rehnement maps are used to dehne what is observable 
at concrete states from viewpoint of the abstract system. Let sprg be the source program and vprg be the 
compiled vector program. We hrst dehne a function pcT that takes as input the vector machine’s program 
counter pc and a vector program vprg and returns the corresponding value of the scalar machine’s 
program counter. 


(defun num-scaler-inst (inst) 
(cond ((vecinstp inst) 

2 ) 

((instp inst) 

1 ) 

(t 0))) 


(defun pcT (pc vprg) 

"maps values of the vector machine’s pc to the corresponding values of 
the scalar machine’s pc" 

(let ((inst (nth pc vprg))) 
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(cond ((or (not (integerp pc)) 

(< pc 0)) 

0 ) 

((zp pc) 

(num-scaler-inst inst)) 

(t (+ (num-scaler-inst inst) (pcT (1- pc) vprg)))))) 

We next define a funetion scalarize-vprg that takes as input a veetor program vprg. It walks 
through the list of instruetions in vprg and translates each instruction in one the following ways: if it 
is a vector instruction it scalarizes it into a list of corresponding scalar instructions, else if it is a scalar 
instruction it returns the list containing the instruction itself (function scalarize below). The result of 
scalarize-vprg is a scalar program. Notice that this function is significantly simpler than the compiler 
transformation procedure. This is because the complexity of a compiler transformation typically lies in 
its analysis phase, which determines if the transformation is even feasible, and not in the transformation 
phase itself. 

(defun scalarize (inst) 

"scalerize a vector instruction" 

(cond ((vecinstp inst) 

(let ((op (vecinst-op inst)) 

(ral (car (vecinst-ra inst))) 

(ra2 (cdr (vecinst-ra inst))) 

(rbl (car (vecinst-rb inst))) 

(rb2 (cdr (vecinst-rb inst))) 

(rcl (car (vecinst-rc inst))) 

(rc2 (cdr (vecinst-rc inst)))) 

(case op 

(vadd (list (inst 'add rcl ral rbl) 

(inst 'add rc2 ra2 rb2))) 

(vsub (list (inst 'sub rcl ral rbl) 

(inst 'sub rc2 ra2 rb2))) 

(vmul (list (inst 'mul rcl ral rbl) 

(inst 'mul rc2 ra2 rb2)))))) 

((instp inst) (list inst)) 

(t nil))) 

(defun scalarize-vprg-aux (pc vprg) 

"scalerize the vector program from [0,pc]" 

(if (or (not (integerp pc)) 

(< pc 0)) 

nil 

(let ((inst (nth pc vprg))) 

(cond 

((zp pc) ;=0 
(scalarize inst)) 

(t 

(append (scalarize-vprg (1- pc) vprg) (scalarize inst))))))) 
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(defun scalarize-vprg (vprg) 

(scalarize-vprg-aux (len vprg) vprg)) 

The refinement map ref-map: Sc —>• Sa, now can be defined as follows. 


(defun ref-map (s) 

(let* ((store (vstate-store s)) 

(vprg (vstate-vprg s)) 

(isapc (pcT (1- (vstate-pc s)) vprg))) 

(sstate isapc store (scalarize-vprg (len vprg) vprg)))) 


Given ref-map, we define B to be the binary relation induced by the refinement map, i.e., sBw iff s &Sc 

and w = (ref-map s). Notice that since the machines do not stutter, WFSK2 (|Definition 2) can be 

c 

simplified as follows. For all s,u ^ Sc such that s ^ u: 


(ref-map 


s) 


,<3 


(ref-map u) 


( 2 ) 


Since the vector machine is deterministic, m is a function of s, so we can remove u from the above 
formula, if we wish. Also, we can expand out —^ to obtain a formula using only —)■ instead. We prove 
the appropriate lemmas to prove the final theorem: vector machine refines scalar machine. 

(defthm vprg-skip-refines-sprg 
(implies (and (vstatep s) 

(equal w (ref-map s))) 

(spec-step-skip-rel w (ref-map (vec-step s))))) 
where vstatep is the recognizer for a state of vector machine; vec-step is a transition function for 
vector machine; and spec-step-skip-rel is a function that takes as input two states of scalar machine 
and returns true if the second is reachable from the first in less than three steps. 

Note that pcT{pc,vprg) can also be determined using a history variable and would be a preferable 
strategy from verification efficiency perspective. 


5 Conclusion and Future Work 


In this paper, we used skipping refinement to prove the correctness of three optimized reactive systems 
in ACL2s. The concrete optimized systems can run “faster” than the corresponding abstract high-level 
specifications. Skipping refinement is an appropriate notion of correctness for reasoning about such 
optimized systems. Furthermore, well-founded skipping simulation gives “local” proof method that is 
amenable for automated reasoning. Stuttering simulation and bisimulation have been used widely to 
prove correctness of several interesting systems ifldlfTfilfTTl . However, we have shown that these notions 
are too strong to analyze the class of optimized reactive systems studied in this paper. Skipping simula¬ 
tion is a weaker and more generally applicable notion than stuttering simulation. In particular, skipping 
simulation can be used to reason about superscalar processors, pipelined processors with multiple in¬ 
structions completion, without modifying the specification (ISA), an open problem in ifT^ . We refer the 
reader to our companion paper ifTOl for a more detailed discussion on related work. 

For future work, we would like to develop a methodology to increase proof automation for prov¬ 
ing correctness of systems based on skipping refinement. In lITOl . we showed how model-checkers can 
be used to analyze correctness for finite-state systems. Similarly, we would like to use the GL frame¬ 
work lITSl . a verified framework for symbolic execution in ACL2, to further increase the efficiency and 
automation. 
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